Testing a Word Analysis System for Reliable and Sense-Conveying Hyphenation and Other Applications

نویسندگان

  • Martin Schönhacker
  • Gabriele Kodydek
چکیده

In this article, we present a test environment for a word analysis system that is used for reliable and sense-conveying hyphenation of German words. A crucial task is the hyphenation of compound words, a huge set of those can readily be formed from existing words. Due to this fact, testing and checking all existing words for correct hyphenation is infeasible. Therefore we have developed special test methods for large text files which filter the few problematic cases from the complete set of analyzed words. These methods include detecting unknown or ambiguous words, comparing the output of different versions of the word analysis system, and choosing dubious words according to other special criteria. The test system is also suited for testing other applications that are based on word analysis, such as full text search.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Word Analysis System for German Hyphenation, Full Text Search, and Spell Checking, with Regard to the Latest Reform of German Orthography

In text processing systems German words require special treatment because of the possibility to form compound words as a combination of existing words. To this end, a universal word analysis system will be introduced which allows an analysis of all words in German texts according to their atomic components. A recursive decomposition algorithm, following the rules for word flexion, derivation, a...

متن کامل

Si3Trenn and Si3Silb: Using the SiSiSi Word Analysis System Pre-hyphenation and Syllable Counting in German Documents

We present two applications of a word analysis system for the German language: pre-hyphenation of documents in various formats, and counting the syllables of all words of a document. The Si3Trenn preprocessor provides pre-hyphenation for file formats allowing for soft hyphens (currently: plain text, LTEX, RTF). It applies reliable, senseconveying hyphenation (SiSiSi) to each word of the input t...

متن کامل

Automatic non-standard hyphenation in OpenOffice.org

The hyphenation algorithm of OpenOffice.org 2.0.2 is a generalization of TEX’s hyphenation algorithm that allows automatic non-standard hyphenation by competing standard and non-standard hyphenation patterns. With the suggested integration of linguistic tools for compound decomposition and word sense disambiguation, this algorithm would be able to do also more precise non-standard and standard ...

متن کامل

Nonlinear Instability of Coupled CNTs Conveying Viscous Fluid

In the present study, nonlinear vibration of coupled carbon nanotubes (CNTs) in presence of surface effect is investigated based on nonlocal Euler-Bernoulli beam (EBB) theory. CNTs are embedded in a visco-elastic medium and placed in the uniform longitudinal magnetic field. Using von Kármán geometric nonlinearity and Hamilton’s principle, the nonlinear higher order governing equations are deriv...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000